AI Memo AIM-2003-019 Permutation Tests for Classification

نویسندگان

Sayan Mukherjee

Polina Golland

Dmitry Panchenko

Pablo Tamayo

Vladimir Koltchinskii

Jill Mesirov

چکیده

We introduce and explore an approach to estimating statistical significance of classification accuracy, which is particularly useful in scientific applications of machine learning where high dimensionality of the data and the small number of training examples render most standard convergence bounds too loose to yield a meaningful guarantee of the generalization ability of the classifier. Instead, we estimate statistical significance of the observed classification accuracy, or the likelihood of observing such accuracy by chance due to spurious correlations of the high-dimensional data patterns with the class labels in the given training set. We adopt permutation testing, a non-parametric technique previously developed in classical statistics for hypothesis testing in the generative setting (i.e., comparing two probability distributions). We demonstrate the method on real examples from neuroimaging studies and DNA microarray analysis and suggest a theoretical analysis of the procedure that relates the asymptotic behavior of the test to the existing convergence bounds.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Logic and Artiicial Intelligence: Divorced, Still Married, Separated . . . ?

Though it's diicult to agree on the exact date of their union, logic and artiicial intelligence (AI) were married by the late 1950s, and, at least during their honeymoon, were happily united. What connubial permutation do logic and AI nd themselves in now? Are they still (happily) married? Are they divorced? Or are they only separated, both still keeping alive the promise of a future in which t...

متن کامل

Concept Drift Detection and Model Selection with Simulated Recurrence and Ensembles of Statistical Detectors

The paper presents a concept drift detection method for unsupervised learning which takes into consideration the prior knowledge to select the most appropriate classification model. The prior knowledge carries information about the data distribution patterns that reflect different concepts, which may occur in the data stream. The presented method serves as a temporary solution for a classificat...

متن کامل

Permutation Tests for Classification: Towards Statistical Significance in Image-Based Studies

Estimating statistical significance of detected differences between two groups of medical scans is a challenging problem due to the high dimensionality of the data and the relatively small number of training examples. In this paper, we demonstrate a non-parametric technique for estimation of statistical significance in the context of discriminative analysis (i.e., training a classifier function...

متن کامل

Validating cluster size inference: random field and permutation methods.

Cluster size tests used in analyses of brain images can have more sensitivity compared to intensity based tests. The random field (RF) theory has been widely used in implementation of such tests, however the behavior of such tests is not well understood, especially when the RF assumptions are in doubt. In this paper, we carried out a simulation study of cluster size tests under varying smoothne...

متن کامل

Statistical tests for fMRI based on experimental randomization.

Statistical parametric mapping (SPM) analysis of fMRI data requires specifying correctly the temporal and spatial noise covariance structure. This is a difficult if not impossible task. When these assumptions are not satisfied, statistical inference can be invalid or inefficient. Permutation tests are free of strong assumptions on the distribution of signal noise. We propose permutation tests o...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2003

AI Memo AIM-2003-019 Permutation Tests for Classification

نویسندگان

چکیده

منابع مشابه

Logic and Artiicial Intelligence: Divorced, Still Married, Separated . . . ?

Concept Drift Detection and Model Selection with Simulated Recurrence and Ensembles of Statistical Detectors

Permutation Tests for Classification: Towards Statistical Significance in Image-Based Studies

Validating cluster size inference: random field and permutation methods.

Statistical tests for fMRI based on experimental randomization.

عنوان ژورنال:

اشتراک گذاری